03. Data Analyst Fact Sheet
Data Analyst Fact Sheet
Data Analysis Fact Sheet:
Here are some quick facts that can help you evaluate whether Data Analysis is the right path for you!
What are the different types of data analysts?
- Data analysis is multidisciplinary in nature, and different analysts may focus on different aspects of the analysis process. In a data analysis project, data must first be gathered and wrangled into a form that makes it easy to work with. Collected data is then explored to understand the structure and the relationship between features of the data. With this information in hand, data analysts can proceed with data modeling. This process involves formally describing the patterns and trends in data to explain the observations or to build functions for predicting future outcomes. Finally, we will want to summarize our findings in some way. This often takes the form of visualizations that allow a broader audience to understand the obtained conclusions.
- The steps that are performed at each stage of the process are linked to other parts. Even if an analyst has a focus on one area of the data analysis, it is useful to keep in mind how the other steps work in order to inform their own procedures; data analysis is not a completely linear process. We may find things in later steps that require us to take a look back at what we performed earlier on, to refine what we did before. Having a broad foundation in all aspects of data analysis is a valuable trait for a successful analyst, as it provides a big-picture view of the full process of a project.
What are essential data analyst skills?
- Mathematics and statistics are valuable components of a data analyst’s skill-set. In order to justify conclusions drawn from the data, a data analyst needs to select the proper statistical tests to perform and understand how to interpret the results. Math and statistics knowledge are also useful for understanding technical aspects of models that are employed, in case the analyst wants to develop new or alternate techniques to go beyond what is built into the software or languages they use.
- Data analysts should be comfortable with programming. While there are some software packages out there where analyses can be performed without knowing programming, it is inevitable that a data analyst will need to do something that cannot be performed “out of the box.” The ability to program provides the flexibility to use a wider variety of tools and obtain a greater range of customized effects in the way that a person explores and understands the data. Programming can play a big part in every part of the data analysis process, from data wrangling to exploration and mathematical modeling. Even the creation of visualizations can require programming experience, such as the creation of customized, interactive visualizations.
- Data analysts should have a sense of curiosity. A good data analyst will be able to ask questions of their data such that they generate a logical flow in their analysis. If an oddity is detected (e.g. missing data, outliers, unexpected trends), then steps should be taken to understand the oddity and try to resolve it.
What are the differences and similarities of data analysis languages?
- It would be useful for an analyst to be comfortable with different tools to accomplish their tasks. Programs without much flexibility can be useful. Even a simple spreadsheet program can have its place for quickly checking the structure of data. However, at some point, a standalone programming language will be desired to flexibly deal with data analysis tasks. In the Data Analyst Nanodegree Program at Udacity, we focus on two languages in particular, R and Python.
Why would one analysis tool or programming language be more suitable for one project over another?
- R is a language that is built for statistics. It is supported by a very broad number of packages that add to the base functionality of the language, some of which are written for very specialized tasks. It is very easy to create good-looking visualizations to hasten exploration of the data, using packages like ggplot2. Packages like dplyr and tidyr are useful for reshaping data. On the other hand, R is fairly specialized in its focus, and it is much harder to use it for more general analysis tasks. In particular, when it comes to data wrangling (which can take up a majority of a data analyst’s time), R can be difficult to work with. Since R is open source and used by many people for statistical analysis, there is a large online community of support.
- Python is a general programming language that is up-and-coming in the data analysis world. The breadth of data analysis packages does not compare to the world of modules available in R, though packages like scikit-learn, matplotlib and seaborn are expanding the ability to use Python for machine learning and visualizations. Python also tends to be easier to learn and understand than R. The flexibility of Python as a general programming language also works greatly in its favor as a one-stop language for handling all parts of the data analysis workflow. As noted above, Python is much more tuned to general processing tasks (as performed in data wrangling) than R.
- Ultimately, it can be useful to know at least a little bit about both languages, since they bring their own strengths to the data analysis table.
What are some important concepts data analysts need to know?
- Although there is a general set of steps that an analysis project tends to follow, it is good for an analyst to be flexible and return to earlier steps if they find that additional data wrangling is needed to obtain the necessary information for analysis, or to produce more exploratory plots to understand an unexpected finding during mathematical modeling.
How will I know if I'm ready for a path in Data Analysis?
If you've completed the previous lessons in this Nanodegree Program, you’ve learned Python, a programming language that has an incredible number of data analysis libraries. You will learn how to use functions from Python libraries to do data analysis work.
INSTRUCTOR NOTE:
Read about comparisons between R and Python on these blogs: